在2015年和2019年之间,地平线的成员2020年资助的创新培训网络名为“Amva4newphysics”,研究了高能量物理问题的先进多变量分析方法和统计学习工具的定制和应用,并开发了完全新的。其中许多方法已成功地用于提高Cern大型Hadron撞机的地图集和CMS实验所执行的数据分析的敏感性;其他几个人,仍然在测试阶段,承诺进一步提高基本物理参数测量的精确度以及新现象的搜索范围。在本文中,在研究和开发的那些中,最相关的新工具以及对其性能的评估。
translated by 谷歌翻译
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).
translated by 谷歌翻译
虽然许多方法旨在通过突出突出特征来解释预测,但是这些解释服务的目标以及如何评估它们通常不合适。在这项工作中,我们介绍了一个框架,通过在训练教师模型的学生模型上授予学生模型的准确性增益来量化解释的价值。至关重要的是,培训期间学生可以使用解释,但在测试时间不可用。与先前的建议相比,我们的方法不太易于绘制,实现原则,自动,模型 - 无话会的归属。使用我们的框架,我们比较了许多归属方法,用于文本分类和问题应答,并观察不同学生模型架构和学习策略之间的定量差异(在中度到高度)。
translated by 谷歌翻译